Skip to content

chore: promote staging to staging-promote/ec04354c-23271447493 (2026-03-19 04:37 UTC)#1396

Merged
henrypark133 merged 7 commits intostaging-promote/ec04354c-23271447493from
staging-promote/3dcccc1e-23280048384
Mar 19, 2026
Merged

chore: promote staging to staging-promote/ec04354c-23271447493 (2026-03-19 04:37 UTC)#1396
henrypark133 merged 7 commits intostaging-promote/ec04354c-23271447493from
staging-promote/3dcccc1e-23280048384

Conversation

@ironclaw-ci
Copy link
Contributor

@ironclaw-ci ironclaw-ci bot commented Mar 19, 2026

Auto-promotion from staging CI

Batch range: 428303af1128e7f124ad623fc1338393a4d06fcc..3dcccc1e64ea92fef2a44cf413b7cf974821da96
Promotion branch: staging-promote/3dcccc1e-23280048384
Base: staging-promote/ec04354c-23271447493
Triggered by: Staging CI batch at 2026-03-19 04:37 UTC

Commits in this batch (22):

Current commits in this promotion (5)

Current base: staging-promote/ec04354c-23271447493
Current head: staging-promote/3dcccc1e-23280048384
Current range: origin/staging-promote/ec04354c-23271447493..origin/staging-promote/3dcccc1e-23280048384

Auto-updated by staging promotion metadata workflow

Waiting for gates:

  • Tests: pending
  • E2E: pending
  • Claude Code review: pending (will post comments on this PR)

Auto-created by staging-ci workflow

zmanian and others added 3 commits March 18, 2026 20:37
* feat(telegram): support auto split large message

* fix(telegram): strengthen split_message test assertion

Replace word-by-word contains check with assert_eq! on rejoined chunks,
ensuring split_message preserves content exactly.

send_response is still used (lines 745, 753) so it is intentionally kept.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(telegram): add missing split_message tests and document limitations

- Add test for sentence-boundary splitting
- Add test for hard-cut on pathological input (no spaces)
- Add test for multi-byte character safety (emoji)
- Document CJK sentence punctuation limitation
- Document trim behavior at chunk boundaries

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ci: re-trigger CI with latest changes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Hans <me@hans00.me>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(testing): add FaultInjector framework for StubLlm (#1220)

Adds a configurable fault injection framework for testing retry, failover,
and circuit breaker behavior. The FaultInjector attaches to StubLlm and
provides per-call control over failure type, timing, and sequencing.

Components:
- FaultType: maps to LlmError variants (RequestFailed, RateLimited,
  AuthFailed, InvalidResponse, IoError, ContextLengthExceeded, SessionExpired)
- FaultAction: Succeed, Fail(FaultType), Delay(Duration)
- FaultMode: SequenceOnce (play then succeed), SequenceLoop (repeat forever),
  Random (seeded xorshift64 PRNG for reproducibility)
- FaultInjector: thread-safe (AtomicU32 counter + Mutex RNG)

Integration:
- StubLlm gains optional fault_injector field via with_fault_injector()
- When set, takes precedence over should_fail/error_kind
- Backward compatible: existing StubLlm usage unchanged

Closes #1220

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(testing): address review feedback on FaultInjector

- Remove redundant .abs() in random fault comparison
- Extract check_faults() helper to DRY up StubLlm methods
- Guard xorshift seed=0 (fixed point) by mapping to 1
- Add StubLlm integration test (stub_llm_fault_injector_sequence)
- Remove dead seed field from FaultMode::Random
- Move pub mod fault_injection to top of mod.rs
- Add Debug impl for FaultInjector
- Add empty_sequence_always_succeeds test
- Add random_seed_zero_does_not_always_fail test

* fix(testing): address #1233 review -- seed-0 bug, reset(), Debug derive

- Store seed in FaultMode::Random so reset() can re-init the RNG
- Add reset() method for test reproducibility (re-seeds RNG, zeros counter)
- Strengthen seed=0 regression test to 100 iterations with stricter assertion
- Add reset_restores_random_rng_from_stored_seed test
- Debug impl and empty_sequence test were already present from prior commit

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* ci: re-trigger CI with latest changes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ci: trigger new run with skip-regression-check label

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(testing): address PR #1233 review -- error_rate validation and edge cases

- Validate error_rate is in 0.0..=1.0 and not NaN (panics on invalid input)
- Fix error_rate==1.0 edge case: use <= instead of < so 1.0 always fails
- Add regression tests for error_rate validation (NaN, negative, >1.0)
- Add tests for error_rate boundary values (0.0 never fails, 1.0 always fails)
- Add delay action test using tokio::time::pause() for deterministic timing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(self-repair): wire stuck_threshold, store, and builder (#647)

Wire the previously dead-code fields in DefaultSelfRepair:

- stuck_threshold: detect_stuck_jobs() now filters by duration, only
  reporting jobs stuck longer than the configured threshold
- with_store(): wired in agent_loop.rs from AgentDeps.store for
  tool failure tracking via Database trait
- with_builder(): wired from register_builder_tool() return value
  through AppComponents and AgentDeps for automatic tool rebuilding
- tools: passed alongside builder for hot-reload logging

Remove all #[allow(dead_code)] annotations. Add regression tests for
threshold-based filtering (both above and below threshold).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add missing `builder` field to AgentDeps in gateway workflow harness

After rebase onto staging, AgentDeps gained a `builder` field for
self-repair tool rebuilding. The gateway workflow test harness was
missing this field, causing CI compilation failure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ci: retrigger CI

* fix: force CI refresh after path_routing_tests dedup

* test: add E2E test for stuck job repair and tool rebuild cycle

Tests the full self-repair flow requested in review:
1. Job transitions Pending -> InProgress -> Stuck
2. detect_stuck_jobs() finds it (zero threshold)
3. repair_stuck_job() recovers it back to InProgress
4. A broken tool is repaired via MockBuilder
5. Verify builder was invoked and repair succeeded

Uses a MockBuilder (impl SoftwareBuilder) that returns successful
BuildResult without requiring an LLM or filesystem. Uses libsql
test database for the store (increment_repair_attempts, mark_tool_repaired).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(self-repair): measure stuck_duration from Stuck transition, not started_at

- Use ctx.transitions to find the most recent Stuck transition timestamp
  instead of ctx.started_at (which reflects job start, not stuck time)
- Fix StuckJob.last_activity to use stuck transition timestamp
- Remove misleading "hot-reloaded into registry" log
- Remove stray "// ci fix" comment in memory.rs
- Add regression test: backdated started_at must not inflate stuck_duration

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ci: re-trigger CI with latest changes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add type annotation to Ok(()) in test to resolve E0282

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added scope: agent Agent core (agent loop, router, scheduler) scope: tool Tool infrastructure size: XL 500+ changed lines risk: medium Business logic, config, or moderate-risk modules contributor: core 20+ merged PRs labels Mar 19, 2026
@claude
Copy link

claude bot commented Mar 19, 2026

Code review

Found 1 issue:

  1. [HIGH:100] Missing builder field in AgentDeps construction in tests/e2e_telegram_message_routing.rs:183. The PR adds a required builder field to the AgentDeps struct (and updates all test fixtures), but this E2E test file's AgentDeps construction was not updated. This will cause a compilation failure.

let deps = AgentDeps {
owner_id: components.config.owner_id.clone(),
store: components.db.clone(),
llm: components.llm.clone(),
cheap_llm: components.cheap_llm.clone(),
safety: components.safety.clone(),
tools: components.tools.clone(),
workspace: components.workspace.clone(),
extension_manager: components.extension_manager.clone(),
skill_registry: components.skill_registry.clone(),
skill_catalog: components.skill_catalog.clone(),
skills_config: components.config.skills.clone(),
hooks: components.hooks.clone(),
cost_guard: components.cost_guard.clone(),
sse_tx: None,
http_interceptor: None,
transcription: None,
document_extraction: None,
};

Lines 183-201: AgentDeps construction missing builder field. Other test files (test_rig.rs, gateway_workflow_harness.rs) were updated to include builder: None, but this one was missed.

@claude
Copy link

claude bot commented Mar 19, 2026

Additional findings

Found additional issues in telegram message splitting tests:

  1. [MEDIUM:75] Test assertion will fail in test_split_message_word_boundary. The implementation intentionally drops whitespace at split boundaries (as documented in the code comment: "this drops leading/trailing spaces at split points"), but the test expects exact equality after rejoining: assert_eq!(rejoined, text). When splitting at line/paragraph boundaries or sentence ends, the intermediate newlines/spaces are lost, making this assertion incorrect.

// Trim whitespace at chunk boundaries for clean Telegram display.
// Note: this drops leading/trailing spaces at split points, which is
// acceptable for chat messages but means the concatenation of chunks
// may not exactly equal the original text when split at spaces.
chunks.push(remaining[..split_at].trim_end().to_string());
remaining = remaining[split_at..].trim_start();
}
chunks
}

Lines 432-441: The comment states "Rejoined chunks must equal the original text exactly", but this contradicts the documented behavior at line 76-77 that whitespace is dropped at split boundaries.

@claude
Copy link

claude bot commented Mar 19, 2026

Performance & Production Issues

Found additional performance concerns:

  1. [HIGH:HIGH] O(n) transition vector scan on every stuck job detection. In src/agent/self_repair.rs:123-128, the code calls .iter().rev().find() on the job's transitions vector for every stuck job. For jobs with hundreds of state transitions, this becomes expensive at each detection interval. Consider storing stuck_at: Option<DateTime> directly on JobContext to avoid repeated scans.

for job_id in stuck_ids {
if let Ok(ctx) = self.context_manager.get_context(job_id).await
&& ctx.state == JobState::Stuck
{
let stuck_duration = ctx
.started_at

  1. [MEDIUM:MEDIUM] Missing timeout on builder.build() call in repair task. In src/agent/self_repair.rs:261, the async builder.build() call has no explicit timeout. If the builder hangs, the entire repair task hangs, blocking the repair interval loop from continuing. Add tokio::time::timeout() wrapper.

  1. [MEDIUM:HIGH] Multiple redundant UTF-8 scans in split_message. In channels-src/telegram/src/lib.rs:394-399, the code calls char_indices().take(4096) on every window iteration. For a 1MB message, this repeats ~244 times with full re-scans. Consider caching or using skip() to avoid recomputation.

henrypark133 and others added 4 commits March 18, 2026 23:38
…tion (#1400)

- Add `builder: None` to AgentDeps initializer in e2e_telegram_message_routing
  test (field added in #712 but test not updated)
- Update go_to_extensions() in test_telegram_hot_activation to navigate via
  settings tab -> extensions subtab (extensions tab was moved to settings)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: navigate telegram E2E tests to channels subtab

wasm_channel extensions (like telegram) are now rendered in the
Settings → Channels subtab, not the Extensions subtab. Update
test_telegram_hot_activation to navigate there and use the correct
card selector. Also mock /api/gateway/status which loadChannelsStatus
fetches.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: select telegram card by name, not first card in channels subtab

Built-in channel cards (Web Gateway, HTTP, etc.) render first in the
channels subtab content, so .first matches them instead of the
telegram extension card. Select by has_text="Telegram" to target
the correct card.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: make gateway_status_handler parameterizable in mock helper

Address review feedback: extract default gateway status handler and
accept an optional gateway_status_handler kwarg in mock_extension_lists
for test flexibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…6242

chore: promote staging to staging-promote/b9e5acf6-23283208580 (2026-03-19 15:15 UTC)
…8580

chore: promote staging to staging-promote/3dcccc1e-23280048384 (2026-03-19 06:44 UTC)
@henrypark133 henrypark133 merged commit e582166 into staging-promote/ec04354c-23271447493 Mar 19, 2026
13 checks passed
@henrypark133 henrypark133 deleted the staging-promote/3dcccc1e-23280048384 branch March 19, 2026 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: medium Business logic, config, or moderate-risk modules scope: agent Agent core (agent loop, router, scheduler) scope: tool Tool infrastructure size: XL 500+ changed lines staging-promotion

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants